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Abstract. A run is a maximal occurrence of a repetition v with a period 
p such that 2p < \v\. The maximal number of runs in a string of length n 
was studied by several authors and it is known to be between 0.944n and 
1.029n. We investigate highly periodic runs, in which the shortest period 
p satisfies 3p < \v\. We show the upper bound 0.5n on the maximal 
number of such runs in a string of length n and construct a sequence of 
words for which we obtain the lower bound 0.406n. 



1 Introduction 

Repetitions and periodicities in strings are one of the fundamental topics in 
combinatorics on words [2, 13]. They are also important in other areas: lossless 
compression, word representation, computational biology etc. Repetitions are 
studied from different directions: classification of words not containing repeti- 
tions of a given exponent, efficient identification of factors being repetitions of 
different types and finally computing the bounds of the number of repetitions 
of a given exponent that a string may contain, which we consider in this paper. 
Both the known results in the topic and a deeper description of the motivation 
can be found in the survey by Crochemore et al. [5]. 

The concept of runs (also called maximal repetitions) has been introduced to 
represent all repetitions in a string in a succinct manner. The crucial property of 
runs is that their maximal number in a string of length n (denoted as runs(n)) is 
0(n) [10]. Due to the work of many people, much better bounds on runs(n) have 
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been obtained. The lower bound 0.927n was first proved in [8]. Afterwards it 
was improved by Kusano et al. [12] to 0.944n employing computer experiments 
and very recently by Simpson [18] to 0.944575712n. On the other hand, the first 
explicit upper bound 5n was settled in [15], afterwards it was systematically 
improved to 3.44n [17], l.Qn [3, 4] and 1.52n [9]. The best known result runs(n) < 
1.029n is due to Crochemore et al. [6], but it is conjectured [10] that runs(n) < n. 
The maximal number of runs was also studied for special types of strings and 
tight bounds were established for Fibonacci strings [10, 16] and more generally 
Sturmian strings [1]. 

The combinatorial analysis of runs in strings is strongly related to the prob- 
lem of estimation of the maximal number of occurrences of squares in a string. 
In the latter the gap between the upper and lower bound is much larger than 
for runs [5, 7]. However, a recent paper [11] by some of the authors shows that 
introduction of exponents larger than 2 can lead to obtaining tighter bounds for 
the number of corresponding occurrences. 

In this paper we introduce and study the concept of highly periodic runs 
(hp-runs) in which the period is at least three times shorter than the run. We 
show the following bounds on the number hp-runs(n) of such runs in a string of 
length n: 

n — 1 

0.406n < hp-runs(n) < — ^— 

The upper bound is achieved by analyzing prime words (i.e. words that are 
primitive and minimal/maximal in the class of their cyclic equivalents) that 
appear as periods of hp-runs. As for the lower bound, we give a simple argument 
that leads to 0.4n bound and then describe a family of words that improves this 
bound to 0.406n. 

2 Definitions 

We consider words over a finite alphabet A, u e A*; by e we denote an empty 
word; the positions in a word u are numbered from 1 to |u|. By Alph(u) we 
denote the set of all letters of u. For u = uiu 2 ■ ■ ■ u m , by u[i . .j] we denote a 
factor of u equal to m . . . uj (in particular u[i] = u[i . . Words u[l . . i] are 
called prefixes of u, and words u[i . .to] — suffixes of u. We say that positive 
integer p is the (shortest) period of a word u — u\ . . . u m (notation: p = per(u)) 
if p is the smallest number such that Ui = Ui +P holds for all 1 < i < m — p. 

If w k — u (k is a non-negative integer) then we say that u is the k th power of 
the word w. A square is the 2 nd power of some word. The primitive root of a word 
u, denoted root(u), is the shortest such word w that w k = u for some positive k. 
We call a word u primitive if root(u) = u, otherwise it is called nonprimitive. We 
say that words u and v are cyclically equivalent (or that one of them is a cyclic 
rotation of the other) if u = xy and v — yx for some x, y e A*. It is a simple 
observation that if u and v are cyclically equivalent then root(w) = root(w). 

Let us assume that A is totally ordered by < what induces a lexicographical 
order in A*, also denoted by <. We say that u G A* is a prime word if it 



2 



is primitive and minimal or maximal in the class of words that are cyclically 
equivalent to it. It can be proved [13] that a prime word u cannot have a proper 
(i.e. non-empty and different than u) prefix that would also be its suffix. 

A run (also called a maximal repetition) in a string u is an interval [i . . j] 
such that both the associated factor u[i . . j] has period p, 2p < j — i + 1, and the 
property cannot be extended to the right nor to the left: u[i — 1] ^ u[i + p — 1] 
and u\j — p + 1] ^ u[j + 1] when the letters are defined. A highly periodic run 
(hp-run) is a run [i . . j] for which the shortest period p satisfies 3p < j — i + 1 . 
For simplicity, in the further text we sometimes refer to runs or hp-runs as to 
occurrences of corresponding factors of u. 

3 Upper bound 

Let u e A* be a word of length n. By P = {pi,p 2 , ■ ■ ■ ,p n -i} we denote the set 
of inter-positions of u that are located between pairs of consecutive letters of u. 

We define a function F that assigns to each hp-run v in a string the set of 
handles among all inter-positions within v. Hence, F is a mapping from the set 
of hp-runs occurring in u to the set 2 P of subsets of P. Let v be a hp-run with 
period p and let w be the prefix of v of length p. By w m i n and w max we denote 
words cyclically equivalent to w that are minimal and maximal in lexicographical 
order. We define F(v) as follows: 

a) if w m in 7^ w max then F(v) contains inter-positions between consecutive oc- 
currences of Wmin and between consecutive occurrences of w ma x within v 

b) if Wmin = Wmax then F(v) contains all inter-positions within v. 

Lemma 1. w min and w max are prime words. 

Proof. By the definition of w m i n and w max , it suffices to show that both words 
are primitive. This follows from the fact that, due to the minimality of p, w is 
primitive and that w m in and w m ax are cyclically equivalent to w. □ 

Lemma 2. Case b) from the above definition implies that \w m in\ = 1- 

Proof. Wmin is primitive, therefore if \w rn in\ > 2 then w m in would contain at 
least two distinct letters, a — w m in[l] and b = w m in[i] ^ a. If b < a (b > a) then 
the cyclic rotation of w m in by i — 1 letters would be lexicographically smaller 
(greater) than w m in — a contradiction. □ 

Note that in case b) of the definition of F obviously F(v) contains at least two 
distinct handles. The following lemma concludes that the same property also 
holds in case a). 

Lemma 3. Each of the words and w max is a factor of v. 

Proof. Recall that 3p < \v\, where p = per(u). By Lemma 2, this concludes the 
proof in case b). As for the proof in case a), it suffices to note that the first 
occurrences of each of the words w m in , w m ax within v start non- further than p 
positions from the beginning of v. □ 
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Fig. 1. Illustration of the definition of F and Lemma 3. The arrows in the figure point 
to positions from the set of handles F(v). 



We now show a crucial property of F. 

Lemma 4. F(v\) Pi F(v2) = for every two distinct hp-runs vi, vi in u. 

Proof. Assume to the contrary that Pi £ F(w 1 )nF(w 2 ) is a handle of two different 
runs Vi and v^. By Lemmas 1 and 3, Pi is located in the middle of two squares 
w\ and w\ of prime words, where |ioi| = per(ui) and \wi\ = per(u 2 )- Wi ^ u>2, 
since in the opposite cases runs v\ and vi would be the same. W.l.o.g. assume 
that \wi\ < \ui2\- Then, word wi is both a prefix and a suffix of W2 (see fig. 2), 
what contradicts the primality of u>2- □ 



Fig. 2. A situation where pi is in the middle of two different squares w\ and w\. 

The following theorem concludes the analysis of the upper bound. 
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Theorem 1. A word u E A* of length n may contain at most runs. 

Proof. Due to Lemma 3, for each hp-run v within u, \F(v)\ > 2. Since \P\ = n—1, 
Lemma 4 implies the conclusion of the theorem. □ 

4 Lower bound 

Lemma 5. Let s be a word and denote: 

r = hp-runs(s), £ = \s\ 
There exists a sequence of words (s„)^° =0; s — s, such that 

V V \ 

r n = hp-runs(s n ), £ n = \s n \ and lim = - + — 

Proof. We define the sequence s n recursively. Denote A = Alph(s„) and let A be 
a disjoint copy of A. By we denote the word obtained from s n by substituting 
letters from A with the corresponding letters from A. Wc define s n +i = (s«s^) 3 . 
Recall that £q — £, ro = r and note that for n > 1 

£ n = 6£ n -i, r n = 6r„_i + 1 

By simple induction this concludes that 

r„. r 1 ^ 1 r 1 / "J 



r If 



£ n £ £*-<& £ U\ 6™+! 

i— l 

Taking n — > oo in the above formula we obtain the conclusion of the lemma. □ 

Starting with the 3-letter word s = a 3 for which r/^ = 1/3, from Lemma 5 we 
obtain the bound OAn. This bound is, however, not optimal — we will show an 
example of a sequence of words for which we obtain the bound 0.406n. 
Let A = {a, b}. We denote: 

X = (a 3 b 3 ) 3 , T = a 4 6 3 a, a = XY, (3 = Xa 

Lemma 6. A couple of important properties of words a and (3: 

— XYX introduces a new hp-run with the period 7. Hence, each of the pairs 
aa and a(3 introduces a new hp-run. 

— [3 is a prefix of a. Hence, a(3a(3aa introduces the hp-run (a[3) 3 . 

— Y is a prefix of aX , therefore a is a prefix of f3a. Hence, aa(3a introduces 
the hp-run a 3 . 

Now we will also be dealing with a new alphabet A' — {a,f3}. We define the 
Fibonacci morphism h as: 

h(a) = a(3, h((3) = a 

Let 

/„ = h n (a), r n = hp-runs(/„), £ n = \f n \ 
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n 


r n 


^71 




fn 





9 


26 


0.3462 


a 


1 


17 


45 


0.3778 


a(3 


2 


26 


71 


0.3662 


a(3a 


3 


45 


116 


0.3879 


a(3aa[3 


4 


71 


187 


0.3796 


a/3aal3a/3a 


5 


119 


303 


0.3927 


a(3aa(3a(3aa(3aa(3 


6 


192 


490 


0.3918 


a(3aa(3a(3aa(3aa(3a(3aal3af3a 



Table 1: A first few words of the sequence /„ with the corresponding 
terms of sequences r„ and l n . 



Theorem 2. 



In -particular, 



lim ■ 

n^oo 



> 0.406 



™ > ™™ > 0.406 
£i 9 " 255 329 



Proof. We start with the values £ n ,r n for n < 4 that are precomputed in Table 
1 and show that for n > 5 the following recursive formulas hold: 



&n— 1 ^n — 2 

r„ > r„_i + r„_ 2 + n - 4 if 2 | n 
r„ > r„_i + r„_ 2 + n - 2 if 2\n 



(1) 
(2) 
(3) 



The "in particular" part of the lemma is a straightforward consequence of the 
formulas. 

(1) is obvious, therefore we concentrate on the inequalities for r n . The re- 
cursive part of each of them (r n _i + r„_ 2 ) is a consequence of the formula 
fn = fn-ifn-2 and the fact that Fibonacci words contain repetitions of expo- 
nent at most 2 + <P < 4, see [14]. Due to Lemma 6, for even values of n a new 
hp-run is introduced upon concatenation — see the example for n = 6: 

a (3 aa (3 a (3 aa (3 a a/3\a(3aa (3a(3a 



and for odd values of n, three more hp-runs appear, as in the following example 
for n = 5: 

a[3aa(3a(3 a\a f3aa(3 



a/3aa/3a/3 a\a(3a a{3 
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a(3a a(3a(3a\a (3aa(3 
„ ' 



Apart from that, since 

h{a(3a(3aa) = a(3aa(3aa(3a (3 



contains a hp-run /|, word /„ introduces n — 5 new hp-runs composed form 
/I) /I > • • • ! /n-4i eacn created by iterating h l (af3a(3aa) — see the example for 
n = 7: 

a(3aaf3af3aa(3aaj3af3aa(3 a(3a\a(3aa(3a (3aaf3aa(3 



a/3aa/3al3a al3aaf3af3aal3al3a\a/3 aa (3 a (3 aa (3 aa (3 



In total, we obtain n — 4 new hp-runs for even n and n — 2 for odd n, what 
concludes the proof of the inequalities. □ 
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